← A-level

Statistics

Statistics: All Roads Lead to Hypothesis Testing

Whether you are testing a coin, a correlation, or a variance, every hypothesis test asks exactly the same question. Once you see this, the subject stops feeling like a collection of unrelated procedures and becomes one repeating idea.

The unifying idea — every time

Assume the null hypothesis H₀ is true. Ask: how probable is it that we would see data at least this extreme? If that probability is small enough, we have evidence against H₀.

For a two-tailed test at significance level α, split the region equally: compare each tail to α/2. For tests that use a critical value from tables (PMCC, Spearman, Wilcoxon), the table already encodes this threshold — the underlying logic is identical.

1. Binomial distribution

A-level MathsEdexcelAQAOCROCR MEI

Used when counting the number of successes in a fixed number of independent trials, each with the same probability of success. The classic example is testing whether a coin (or process) is biased.

Setup

Under H₀, the test statistic X (number of successes) follows:

Decision rule

Worked example

A die is suspected of showing six too often. In 30 rolls, six appears 9 times. Test at the 5% significance level.

There is sufficient evidence at the 5% level that the die is biased towards six.

Try it yourself

A coin is tossed 20 times and lands heads 14 times. Test at the 5% level whether the probability of heads exceeds 0.5.

Show answer

There is insufficient evidence at the 5% level that the coin is biased towards heads.

2. Normal distribution (z-test)

A-level MathsEdexcelAQAOCROCR MEI

Used to test the mean of a normally distributed population when the population variance σ² is known. The sample mean X̄ is itself normally distributed.

Setup

Under H₀:

Decision rule

Worked example

The lengths of bolts are normally distributed with σ = 1.2 mm. The target mean is μ = 25 mm. A sample of 36 bolts has mean 25.5 mm. Is there evidence at 1% that the mean has increased?

There is sufficient evidence at the 1% level that the mean length has increased.

Try it yourself

Packages are claimed to weigh μ = 500 g with known σ = 10 g. A sample of 25 packages gives x̄ = 496 g. Test at 5% (two-tailed) whether the mean weight has changed.

Show answer

Sufficient evidence at the 5% level that the mean weight has changed.

3. Product moment correlation coefficient (PMCC)

A-level MathsEdexcelAQAOCROCR MEI

Tests whether there is linear correlation between two variables in a bivariate normal population. The null hypothesis is always that the population correlation ρ is zero.

Setup

The test statistic is the sample PMCC r. The critical values of r are read from tables for the given n and significance level; they implicitly encode P(R ≥ r | ρ = 0) = α.

Decision rule

Worked example

For n = 10 data pairs, r = 0.648. Is there evidence of positive correlation at 5%?

There is sufficient evidence at the 5% level of positive correlation.

Try it yourself

For n = 15 data pairs, r = 0.52. Test at 5% for positive correlation.

Show answer

Sufficient evidence at the 5% level of positive correlation.

4. Poisson distribution

A-level Further MathsEdexcel (Further Statistics 1)AQA (Statistics)OCR (Statistics)OCR MEI (Statistics)

Used when counting events that occur randomly in a fixed interval of time or space. Tests whether the underlying rate λ has changed from a known baseline.

Setup

Under H₀:

Decision rule

Worked example

A call centre receives an average of λ = 4 calls per minute. In one minute, 9 calls arrive. Test at 5% whether the rate has increased.

There is sufficient evidence at the 5% level that the call rate has increased.

Try it yourself

Faults occur at an average rate of λ = 2 per hour. In one hour, 6 faults are recorded. Test at 5% whether the rate has increased.

Show answer

Sufficient evidence at the 5% level that the fault rate has increased.

5. Geometric distribution

A-level Further MathsAQA (Statistics)OCR MEI (Statistics)

Models the number of trials needed to achieve the first success. Useful for testing whether the underlying probability of success has changed.

Setup

Under H₀, X = number of trials to first success has distribution:

A lower probability of success means we expect to wait longer, so H₁: p < p₀ corresponds to large values of X.

Decision rule

Worked example

A machine produces defective items with probability p = 0.3. An engineer suspects the fault rate has fallen. The first defective item is found on the 10th item inspected. Test at 5%.

There is sufficient evidence at the 5% level that the fault rate has decreased.

Try it yourself

A machine has a defect probability of p = 0.4. After maintenance, the first defective item appears on the 8th item inspected. Test at 5% whether the defect probability has decreased.

Show answer

Sufficient evidence at the 5% level that the defect probability has decreased.

6. Negative binomial distribution

A-level Further MathsOCR MEI (Statistics)

Extends the geometric distribution to the number of trials needed to achieve the r-th success. It is examined in OCR MEI Further Maths and uses the same hypothesis-testing framework.

Setup

Under H₀, X = number of trials to r-th success:

Decision rule

Worked example

A seed has germination probability p = 0.6. It takes 7 trials to get the 3rd germination. Test at 10% whether germination probability has decreased.

There is sufficient evidence at the 10% level that germination probability has decreased.

Try it yourself

A coin has probability p = 0.5 of heads. A suspected biased coin takes 8 flips to get the 2nd head. Test at 10% whether the probability of heads has decreased.

Show answer

P(X ≥ 8) = P(at most 1 head in first 7 flips | p = 0.5) = P(Y ≤ 1) where Y ~ B(7, 0.5):

Sufficient evidence at 10% that the probability of heads has decreased.

7. Chi-squared test

A-level Further MathsEdexcel (Further Statistics 1)AQA (Statistics)OCR (Statistics A)OCR MEI (Statistics)

The χ² test appears in two forms: goodness of fit (does data follow a proposed distribution?) and independence (are two categorical variables related?). The test statistic and decision rule are the same in both cases.

Test statistic

where Oi are observed frequencies, Ei are expected frequencies, and ν is the degrees of freedom:

Decision rule

Goodness-of-fit example

A die is rolled 60 times. Each face is expected 10 times. Observed counts are (8, 7, 12, 9, 14, 10). Test at 5% whether the die is fair.

There is insufficient evidence at the 5% level that the die is unfair.

Try it yourself

A tetrahedral die (4 faces) is rolled 40 times. Expected count per face: 10. Observed counts: (8, 12, 7, 13). Test at 5% whether the die is fair.

Show answer

Insufficient evidence at 5% that the die is unfair.

8. Spearman's rank correlation coefficient

A-level Further MathsEdexcel (Further Statistics 1)AQA (Statistics)OCR MEI (Statistics)

A non-parametric alternative to the PMCC. It tests for monotonic association between two variables by ranking the data — no normality assumption is needed.

Setup

The test statistic is:

where di is the difference between ranks for each pair.

Decision rule

Worked example

Seven students are ranked by two judges. The rank differences are (2, −1, 0, 1, −2, 1, −1), giving Σd² = 12. Test at 5% for positive association.

There is (just) sufficient evidence at the 5% level of positive rank association.

Try it yourself

Eight athletes are ranked by race time and by resting heart rate. The rank differences are (1, 0, −1, 2, −1, 0, 1, −2). Test at 5% for positive rank association.

Show answer

Sufficient evidence at 5% of positive rank association.

9. F distribution

A-level Further MathsEdexcel (Further Statistics 2)AQA (Statistics)OCR MEI (Statistics)

Used to compare the variances of two independent normal populations. The F statistic is the ratio of two sample variances. By convention, the larger sample variance goes in the numerator so the observed F is always ≥ 1.

Setup

Under H₀:

Decision rule

Worked example

Machine A: n₁ = 10, s₁² = 24. Machine B: n₂ = 12, s₂² = 8. Test at 10% whether variances differ.

There is sufficient evidence at the 10% level that the variances of the two machines differ.

Try it yourself

Machine A: n1 = 8, s1² = 25. Machine B: n2 = 9, s2² = 5. Test at 10% whether the variances differ.

Show answer

Sufficient evidence at 10% that the variances of the two machines differ.

10. Student's t distribution

A-level Further MathsEdexcel (Further Statistics 1)AQA (Statistics)OCR (Statistics A)OCR MEI (Statistics)

Used to test a population mean when the variance is unknown and must be estimated from the sample. This is more realistic than the z-test. A two-sample version tests whether two population means are equal.

One-sample setup

Under H₀:

Decision rule

Worked example

A fertiliser is claimed to produce a mean yield of μ = 40 kg. A trial on n = 16 plots gives x̄ = 38.2 kg, s = 3.6 kg. Test at 5%.

There is insufficient evidence at the 5% level that the mean yield differs from 40 kg.

Try it yourself

A drug claims to reduce blood pressure by μ = 15 mmHg. A trial on n = 10 patients gives x̄ = 11 mmHg, s = 6 mmHg. Test at 5% whether the true mean reduction differs from 15 mmHg.

Show answer

Insufficient evidence at 5% that the true mean reduction differs from 15 mmHg.

Two-sample t-test

To compare the means of two independent populations with unknown but equal variances (pooled t-test):

11. Wilcoxon rank-sum test (Mann–Whitney)

A-level Further MathsEdexcel (Further Statistics 2)AQA (Statistics)OCR MEI (Statistics)

A non-parametric test for comparing two independent populations — useful when data are not normally distributed. It tests whether the two populations have the same distribution (or equivalently, the same median).

Procedure

  1. Combine all n₁ + n₂ observations and rank them from 1 to n₁ + n₂.
  2. Sum the ranks for one group: W = sum of ranks for group 1.
  3. Compare W to the critical values Wlower and Wupper from tables.

Decision rule

Worked example

Group A (n₁ = 4): scores 3, 7, 11, 15. Group B (n₂ = 4): scores 2, 6, 8, 14. Test at 5% (two-tailed) whether the medians differ.

Combined ranks (score → rank): 2→1, 3→2, 6→3, 7→4, 8→5, 11→6, 14→7, 15→8.

Critical values for n₁ = n₂ = 4, 5% two-tailed: Wlower = 11, Wupper = 25.

There is insufficient evidence at the 5% level that the medians differ.

Summary: the same question, eleven ways

TestLevelModel under H₀"More extreme" means
BinomialMathsX ~ B(n, p₀)X ≥ x or X ≤ x
Normal z-testMathsZ ~ N(0, 1)|Z| ≥ z or Z ≥ z
PMCCMathsρ = 0|r| ≥ rcrit
PoissonFurtherX ~ Po(λ₀)X ≥ x or X ≤ x
GeometricFurtherX ~ Geo(p₀)X ≤ x or X ≥ x
Negative binomialFurtherX ~ NB(r, p₀)X ≥ x or X ≤ x
Chi-squaredFurtherχ² ~ χ²(ν)χ² ≥ χ²obs
Spearman's rankFurtherρs = 0|rs| ≥ rs,crit
F distributionFurtherF ~ F(ν₁, ν₂)F ≥ Fcrit
Student's tFurtherT ~ t(n−1)|T| ≥ tcrit
Wilcoxon rank sumFurthersame medianW ≤ Wlo or W ≥ Whi
Ready to start?
Book an online lesson
Book a lesson